Skip to content

New sgr cloud commands, fleshing out the splitgraph.yml (ex-repositories.yml), codegen for Splitgraph Cloud projects.#582

Merged
mildbyte merged 69 commits intomasterfrom
feature/sgr-cloud-commands
Dec 16, 2021
Merged

New sgr cloud commands, fleshing out the splitgraph.yml (ex-repositories.yml), codegen for Splitgraph Cloud projects.#582
mildbyte merged 69 commits intomasterfrom
feature/sgr-cloud-commands

Conversation

@mildbyte
Copy link
Copy Markdown
Contributor

New sgr cloud commands

These let users manipulate Splitgraph Cloud and ingestion jobs from the CLI:

  • sgr cloud status: view the status of ingestion jobs in the current project
  • sgr cloud logs: view job logs
  • sgr cloud csv: upload a CSV file to Splitgraph Cloud (without using the engine)
  • sgr cloud sync: trigger a one-off load of a dataset
  • sgr cloud stub: generate a splitgraph.yml file
  • sgr cloud seed: generate a Splitgraph Cloud project with a splitgraph.yml, GitHub Actions, dbt etc
  • sgr cloud validate: merge multiple project files and output the result (like docker-compose config)

splitgraph.yml

Default various commands that use repositories.yml to splitgraph.yml instead. Allow "mixing in" multiple .yml files Docker Compose-style (mostly useful for keeping credentials separate from

Wrote some documentation on the new format, GitHub Actions workflow reference-style (a header for every field with its full path in the YAML). It temporarily lives here while we can't easily deploy the docs site: https://github.com/splitgraph/splitgraph.com/blob/f7ac524cb5023091832e8bf51b277991c435f241/content/docs/0900_splitgraph-cloud/0500_splitgraph-yml.mdx

Sample project generation

sgr cloud seed generates a sample Splitgraph Cloud project from a base64-encoded "seed" (e.g. eyJuYW1lc3BhY2UiOiJtaWxkYnl0ZSIsInBsdWdpbnMiOlsicG9zdGdyZXNfZmR3Iiwic25vd2ZsYWtlIl0sImluY2x1ZGVfZGJ0Ijp0cnVlfQo=).

This is mostly for our marketing website which will let people "check out" with a Splitgraph Cloud project that contains their chosen data sources + a dbt transformation. Interested CLI users can still use it by encoding a JSON as base64:

{"namespace":"mildbyte","plugins":["postgres_fdw","snowflake"],"include_dbt":true}

and passing it to sgr cloud seed.

The intended usage is:

Miscellaneous

  • Add lightweight SVG icons to builtin plugins (not used in the CLI but used in Splitgraph Cloud).
  • Allow setting initial repo visibility in sgr cloud sync/sgr cloud load (pass --initial-private to create the repo as private if it doesn't yet exist)
  • Use Unified GQL API instead of separate GQL endpoints
  • Start using pytest-snapshot for tests that involve asserting long CLI/file outputs

(WIP, still need to unpack the response nicer and fix mypy)
WIP: doesn't use settings from repositories.yml
Required a small wrapper for yaml.safe_load/safe_dump to avoid deprecation
warnings, but otherwise a drop-in replacement.
(bring it back in line with the PyYAML output which adds a line break after
every dict element)
Limitations:

  - Isn't/can't be aware of the tables in the source repositories, so we have
    a placeholder there
  - Using a placeholder for the Git URL so that we can inject the repo URL
    at runtime in a GitHub Action
Optionally add a final stage to the GHA pipeline running dbt against all loaded
repos.

Also set repositories as live/not live based on whether they support mount. For
repositories that don't support mount, run ingestion as previously and use
`sgr cloud load` to set up metadata. For live repos, use `sgr cloud load` to set
up metadata and the external data source settings.
(including defaults and tests).

Also delete the inline repositories.yml format documentation from the
`sgr cloud load` commandline (wrote actual docs).
Default still public; override with `--initial-private`
Run the `sgr cloud sync` first with `--initial-private` so that the user's repo
by default becomes private; only then run `sgr cloud load` to set up the metadata.

Doing it vice versa will make `sgr cloud load` create a public repo (and if we're
doing `--skip-external`, we'll only be implicitly creating the repo through
the Postgraphile API where we can't edit initial visibility settings).
Wire it to the `AddExternalRepositoryRequest` model.
…aded.

Avoid redundantly setting up credentials if we're running multiple `sgr cloud load`
instances from different jobs (otherwise it'll upload all credentials for every
repository in `splitgraph.yml` in every job). This is idempotent but still a
waste of time.
Log the errors for credential/add-external endpoints (for credentials, the
JSONSchema error text also quotes the original object, so we mask it unless the
user runs the command with `--verbosity DEBUG`).
@mildbyte mildbyte merged commit 544dc66 into master Dec 16, 2021
mildbyte added a commit that referenced this pull request Dec 17, 2021
Fleshing out the `splitgraph.yml` (aka `repositories.yml`) format that defines a Splitgraph Cloud "project" (datasets, their sources and metadata).

Existing users of `repositories.yml` don't need to change anything, though note that `sgr cloud` commands using the YAML format will now default to `splitgraph.yml` unless explicitly set to `repositories.yml`.


New sgr cloud commands:

See #582 and #587

These let users manipulate Splitgraph Cloud and ingestion jobs from the CLI:

  * `sgr cloud status`: view the status of ingestion jobs in the current project
  * `sgr cloud logs`: view job logs
  * `sgr cloud upload`: upload a CSV file to Splitgraph Cloud (without using the engine)
  * `sgr cloud sync`: trigger a one-off load of a dataset
  * `sgr cloud stub`: generate a `splitgraph.yml` file
  * `sgr cloud seed`: generate a Splitgraph Cloud project with a `splitgraph.yml`, GitHub Actions, dbt etc
  * `sgr cloud validate`: merge multiple project files and output the result (like `docker-compose config`)
  * `sgr cloud download`: download a query result from Splitgraph Cloud as a CSV file, bypassing time/query size limits.


repositories.yml/splitgraph.yml format:

Change various commands that use `repositories.yml` to default to `splitgraph.yml` instead. Allow "mixing in" multiple `.yml` files Docker Compose-style, useful for splitting credentials (and not checking them in) and data settings.

Temporary location for the new full documentation on `splitgraph.yml`: https://github.com/splitgraph/splitgraph.com/blob/f7ac524cb5023091832e8bf51b277991c435f241/content/docs/0900_splitgraph-cloud/0500_splitgraph-yml.mdx


Miscellaneous:

  * Initial backend support for "transforming" Splitgraph plugins, including dbt (#574)
  * Dump scheduled ingestion/transformation jobs with `sgr cloud dump` (#577)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant